104
Binary Neural Architecture Search
where λ is a hyperparameter to balance the two terms. Hl
c is the cth full-precision filter of
the lth convolutional layer and ˆHl
c denotes its corresponding reconstructed filter; MSE(·)
represents the mean square error (MSE) loss. The second term minimizes the intraclass
compactness since the binarization process causes feature variations. fC,s( ˆH) denotes the
feature map of the last convolutional layer for the sth sample, and f C,s( ˆH) denotes the
class-specific mean feature map for the corresponding samples. Combining L ˆ
H with the
conventional loss LCE, we obtain the final loss:
L = LCE + L ˆ
H.
(4.18)
The L and its derivatives are easily calculated directly using the efficient automatic
derivatives package.
4.3.5
Ablation Study
We tested different βP for our method on the CIFAR-10 dataset, as shown on the right side
of Fig. 4.9. We can see that when βP increases, the precision increases at first but decreases
when βP ≥2. It validates that the performance loss between the Child and Parent models
is a significant measure for the 1-bit CNNs search. When βP increases, CP-NAS tends to
select the architecture with fewer convolutional operations, and the imbalance between two
elements in our CP model leads to a performance drop.
We also compare the architectures obtained by CP-NAS, Random, PC (PC-DARTs),
and BNAS† as shown in Fig. 4.9. Unlike the case of the full-precision model, Random
and PC-DARTs lack the necessary guidance, which has poor performance for binarized
architecture search. Both BNAS† and CP-NAS have the evaluation indicator for operation
selection. Differently, our CP-NAS also uses performance loss, which can outperform the
other three strategies.
Efficiency. As shown in XNOR, the 1-bit CNNs are very efficient and promising for
resource-limited devices. Our CP-NAS achieves a performance comparable to that of the
full precision hand-crafted model with up to an estimated 11 times memory saving and 58
times speed up, which is worth further research and will benefit extensive edge computing
applications.
0
1
2
3
4
5
βP
90
91
92
93
94
Accuracy (%)
Random
PC
BNAS†
CP-NAS
Search strategy
89
90
91
92
93
94
Accuracy (%)
FIGURE 4.9
The result (right) for different βP on CIFAR-10. The 1-bit CNNs result (left) for different
search strategies on CIFAR-10, including random search, PC (PC-DARTs), BNAS†, CP-
NAS. We approximately implement BNAS† by setting βP as 0 in CP-NAS, which means
that we only use the performance measure for the operation selection.